The search functionality is under construction.

Keyword Search Result

[Keyword] machine learning(172hit)

121-140hit(172hit)

  • Revisiting the Regression between Raw Outputs of Image Quality Metrics and Ground Truth Measurements

    Chanho JUNG  Sanghyun JOO  Do-Won NAM  Wonjun KIM  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2016/08/08
      Vol:
    E99-D No:11
      Page(s):
    2778-2787

    In this paper, we aim to investigate the potential usefulness of machine learning in image quality assessment (IQA). Most previous studies have focused on designing effective image quality metrics (IQMs), and significant advances have been made in the development of IQMs over the last decade. Here, our goal is to improve prediction outcomes of “any” given image quality metric. We call this the “IQM's Outcome Improvement” problem, in order to distinguish the proposed approach from the existing IQA approaches. We propose a method that focuses on the underlying IQM and improves its prediction results by using machine learning techniques. Extensive experiments have been conducted on three different publicly available image databases. Particularly, through both 1) in-database and 2) cross-database validations, the generality and technological feasibility (in real-world applications) of our machine-learning-based algorithm have been evaluated. Our results demonstrate that the proposed framework improves prediction outcomes of various existing commonly used IQMs (e.g., MSE, PSNR, SSIM-based IQMs, etc.) in terms of not only prediction accuracy, but also prediction monotonicity.

  • A Machine Learning Model for Wide Area Network Intelligence with Application to Multimedia Service

    Yiqiang SHENG  Jinlin WANG  Yi LIAO  Zhenyu ZHAO  

     
    PAPER

      Vol:
    E99-B No:11
      Page(s):
    2263-2270

    Network intelligence is a discipline that builds on the capabilities of network systems to act intelligently by the usage of network resources for delivering high-quality services in a changing environment. Wide area network intelligence is a class of network intelligence in wide area network which covers the core and the edge of Internet. In this paper, we propose a system based on machine learning for wide area network intelligence. The whole system consists of a core machine for pre-training and many terminal machines to accomplish faster responses. Each machine is one of dual-hemisphere models which are made of left and right hemispheres. The left hemisphere is used to improve latency by terminal response and the right hemisphere is used to improve communication by data generation. In an application on multimedia service, the proposed model is superior to the latest deep feed forward neural network in the data center with respect to the accuracy, latency and communication. Evaluation shows scalable improvement with regard to the number of terminal machines. Evaluation also shows the cost of improvement is longer learning time.

  • The Novel Performance Evaluation Method of the Fingerprinting-Based Indoor Positioning

    Shutchon PREMCHAISAWATT  Nararat RUANGCHAIJATUPON  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2016/05/17
      Vol:
    E99-D No:8
      Page(s):
    2131-2139

    In this work, the novel fingerprinting evaluation parameter, which is called the punishment cost, is proposed. This parameter can be calculated from the designed matrix, the punishment matrix, and the confusion matrix. The punishment cost can describe how well the result of positioning is in the designated grid or not, by which the conventional parameter, the accuracy, cannot describe. The experiment is done with real measured data on weekdays and weekends. The results are considered in terms of accuracy and the punishment cost. Three well-known machine learning algorithms, i.e. Decision Tree, k-Nearest Neighbors, and Artificial Neural Network, are verified in fingerprinting positioning. In experimental environment, Decision Tree can perform well on the data from weekends whereas the performance is underrated on the data from weekdays. The k-Nearest Neighbors has proper punishment costs, even though it has lower accuracy than that of Artificial Neural Network, which has moderate accuracies but lower punishment costs. Therefore, other criteria should be considered in order to select the algorithm for indoor positioning. In addition, punishment cost can facilitate the conversion spot positioning to floor positioning without data modification.

  • Guide Automatic Vectorization by means of Machine Learning: A Case Study of Tensor Contraction Kernels

    Antoine TROUVÉ  Arnaldo J. CRUZ  Kazuaki J. MURAKAMI  Masaki ARAI  Tadashi NAKAHIRA  Eiji YAMANAKA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2016/03/22
      Vol:
    E99-D No:6
      Page(s):
    1585-1594

    Modern optimizing compilers tend to be conservative and often fail to vectorize programs that would have benefited from it. In this paper, we propose a way to predict the relevant command-line options of the compiler so that it chooses the most profitable vectorization strategy. Machine learning has proven to be a relevant approach for this matter: fed with features that describe the software to the compiler, a machine learning device is trained to predict an appropriate optimization strategy. The related work relies on the control and data flow graphs as software features. In this article, we consider tensor contraction programs, useful in various scientific simulations, especially chemistry. Depending on how they access the memory, different tensor contraction kernels may yield very different performance figures. However, they exhibit identical control and data flow graphs, making them completely out of reach of the related work. In this paper, we propose an original set of software features that capture the important properties of the tensor contraction kernels. Considering the Intel Merom processor architecture with the Intel Compiler, we model the problem as a classification problem and we solve it using a support vector machine. Our technique predicts the best suited vectorization options of the compiler with a cross-validation accuracy of 93.4%, leading to up to a 3-times speedup compared to the default behavior of the Intel Compiler. This article ends with an original qualitative discussion on the performance of software metrics by means of visualization. All our measurements are made available for the sake of reproducibility.

  • Automating URL Blacklist Generation with Similarity Search Approach

    Bo SUN  Mitsuaki AKIYAMA  Takeshi YAGI  Mitsuhiro HATADA  Tatsuya MORI  

     
    PAPER-Web security

      Pubricized:
    2016/01/13
      Vol:
    E99-D No:4
      Page(s):
    873-882

    Modern web users may encounter a browser security threat called drive-by-download attacks when surfing on the Internet. Drive-by-download attacks make use of exploit codes to take control of user's web browser. Many web users do not take such underlying threats into account while clicking URLs. URL Blacklist is one of the practical approaches to thwarting browser-targeted attacks. However, URL Blacklist cannot cope with previously unseen malicious URLs. Therefore, to make a URL blacklist effective, it is crucial to keep the URLs updated. Given these observations, we propose a framework called automatic blacklist generator (AutoBLG) that automates the collection of new malicious URLs by starting from a given existing URL blacklist. The primary mechanism of AutoBLG is expanding the search space of web pages while reducing the amount of URLs to be analyzed by applying several pre-filters such as similarity search to accelerate the process of generating blacklists. AutoBLG consists of three primary components: URL expansion, URL filtration, and URL verification. Through extensive analysis using a high-performance web client honeypot, we demonstrate that AutoBLG can successfully discover new and previously unknown drive-by-download URLs from the vast web space.

  • Incorporation of Target Specific Knowledge for Sentiment Analysis on Microblogging

    Yongyos KAEWPITAKKUN  Kiyoaki SHIRAI  

     
    PAPER

      Pubricized:
    2016/01/14
      Vol:
    E99-D No:4
      Page(s):
    959-968

    Sentiment analysis of microblogging has become an important classification task because a large amount of user-generated content is published on the Internet. In Twitter, it is common that a user expresses several sentiments in one tweet. Therefore, it is important to classify the polarity not of the whole tweet but of a specific target about which people express their opinions. Moreover, the performance of the machine learning approach greatly depends on the domain of the training data and it is very time-consuming to manually annotate a large set of tweets for a specific domain. In this paper, we propose a method for sentiment classification at the target level by incorporating the on-target sentiment features and user-aware features into the classifier trained automatically from the data createdfor the specific target. An add-on lexicon, extended target list, and competitor list are also constructed as knowledge sources for the sentiment analysis. None of the processes in the proposed framework require manual annotation. The results of our experiment show that our method is effective and improves on the performance of sentiment classification compared to the baselines.

  • Combining Human Action Sensing of Wheelchair Users and Machine Learning for Autonomous Accessibility Data Collection

    Yusuke IWASAWA  Ikuko EGUCHI YAIRI  Yutaka MATSUO  

     
    PAPER-Rehabilitation Engineering and Assistive Technology

      Pubricized:
    2016/01/22
      Vol:
    E99-D No:4
      Page(s):
    1153-1161

    The recent increase in the use of intelligent devices such as smartphones has enhanced the relationship between daily human behavior sensing and useful applications in ubiquitous computing. This paper proposes a novel method inspired by personal sensing technologies for collecting and visualizing road accessibility at lower cost than traditional data collection methods. To evaluate the methodology, we recorded outdoor activities of nine wheelchair users for approximately one hour each by using an accelerometer on an iPod touch and a camcorder, gathered the supervised data from the video by hand, and estimated the wheelchair actions as a measure of street level accessibility in Tokyo. The system detected curb climbing, moving on tactile indicators, moving on slopes, and stopping, with F-scores of 0.63, 0.65, 0.50, and 0.91, respectively. In addition, we conducted experiments with an artificially limited number of training data to investigate the number of samples required to estimate the target.

  • A RAT Detection Method Based on Network Behavior of the Communication's Early Stage

    Dan JIANG  Kazumasa OMOTE  

     
    PAPER

      Vol:
    E99-A No:1
      Page(s):
    145-153

    Remote Access Trojans (RAT) is a spyware which can steal the confidential information from a target organization. The detection of RATs becomes more and more difficult because of targeted attacks, since the victim usually cannot realize that he/she is being attacked. After RAT's intrusion, the attacker can monitor and control the victim's PC remotely, to wait for an opportunity to steal the confidential information. As this situation, the main issue we face now is how to prevent confidential information being leaked back to the attacker. Although there are many existing approaches about RAT detection, there still remain two challenges: to detect RAT sessions as early as possible, and to distinguish them from the normal applications with a high accuracy. In this paper, we propose a novel approach to detect RAT sessions by their network behavior during the early stage of communication. The early stage is defined as a short period of time at communication's beginning; it also can be seen as the preparation period of the communication. We extract network behavior features from this period, to differentiate RAT sessions and normal sessions. For the implementation and evaluation, we use machine learning techniques with 5 algorithms and K-Fold cross-validation. As the results, our approach could detect RAT sessions in the communication's early stage with the accuracy over 96% together with the FNR of 10% by Random Forest algorithm.

  • Using Bregmann Divergence Regularized Machine for Comparison of Molecular Local Structures

    Raissa RELATOR  Nozomi NAGANO  Tsuyoshi KATO  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2015/10/06
      Vol:
    E99-D No:1
      Page(s):
    275-278

    Although many 3D structures have been solved for proteins to date, functions of some proteins remain unknown. To predict protein functions, comparison of local structures of proteins with pre-defined model structures, whose functions have been elucidated, is widely performed. For the comparison, the root mean square deviation (RMSD) has been used as a conventional index. In this work, adaptive deviation was incorporated, along with Bregmann Divergence Regularized Machine, in order to detect analogous local structures with such model structures more effectively than the conventional index.

  • Penalized AdaBoost: Improving the Generalization Error of Gentle AdaBoost through a Margin Distribution

    Shuqiong WU  Hiroshi NAGAHASHI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2015/08/13
      Vol:
    E98-D No:11
      Page(s):
    1906-1915

    Gentle AdaBoost is widely used in object detection and pattern recognition due to its efficiency and stability. To focus on instances with small margins, Gentle AdaBoost assigns larger weights to these instances during the training. However, misclassification of small-margin instances can still occur, which will cause the weights of these instances to become larger and larger. Eventually, several large-weight instances might dominate the whole data distribution, encouraging Gentle AdaBoost to choose weak hypotheses that fit only these instances in the late training phase. This phenomenon, known as “classifier distortion”, degrades the generalization error and can easily lead to overfitting since the deviation of all selected weak hypotheses is increased by the late-selected ones. To solve this problem, we propose a new variant which we call “Penalized AdaBoost”. In each iteration, our approach not only penalizes the misclassification of instances with small margins but also restrains the weight increase for instances with minimal margins. Our method performs better than Gentle AdaBoost because it avoids the “classifier distortion” effectively. Experiments show that our method achieves far lower generalization errors and a similar training speed compared with Gentle AdaBoost.

  • Boosted Random Forest

    Yohei MISHINA  Ryuei MURATA  Yuji YAMAUCHI  Takayoshi YAMASHITA  Hironobu FUJIYOSHI  

     
    PAPER

      Pubricized:
    2015/06/22
      Vol:
    E98-D No:9
      Page(s):
    1630-1636

    Machine learning is used in various fields and demand for implementations is increasing. Within machine learning, a Random Forest is a multi-class classifier with high-performance classification, achieved using bagging and feature selection, and is capable of high-speed training and classification. However, as a type of ensemble learning, Random Forest determines classifications using the majority of multiple trees; so many decision trees must be built. Performance increases with the number of decision trees, requiring memory, and decreases if the number of decision trees is decreased. Because of this, the algorithm is not well suited to implementation on small-scale hardware as an embedded system. As such, we have proposed Boosted Random Forest, which introduces a boosting algorithm into the Random Forest learning method to produce high-performance decision trees that are smaller. When evaluated using databases from the UCI Machine learning Repository, Boosted Random Forest achieved performance as good or better than ordinary Random Forest, while able to reduce memory use by 47%. Thus, it is suitable for implementing Random Forests on embedded hardware with limited memory.

  • Classification of Electromagnetic Radiation Source Models Based on Directivity with the Method of Machine Learning

    Zhuo LIU  Dan SHI  Yougang GAO  Junjian BI  Zhiliang TAN  Jingjing SHI  

     
    PAPER

      Vol:
    E98-B No:7
      Page(s):
    1227-1234

    This paper presents a new way to classify different radiation sources by the parameter of directivity, which is a characteristic parameter of electromagnetic radiation sources. The parameter can be determined from measurements of the electric field intensity radiating in all directions in space. We develop three basic antenna models, which are for 3GHz operation, and set 125,000 groups of cube receiving arrays along the main lobe of their radiation patterns to receive the data of far field electric intensity in groups. Then the Back Propagation (BP) neural network and the Support Vector Machine (SVM) method are adopted to analyze training data set, and build and test the classification model. Owing to the powerful nonlinear simulation ability, the SVM method offers higher classification accuracy than the BP neural network in noise environment. At last, the classification model is comprehensively evaluated in three aspects, which are capability of noise immunity, F1 measure and the normalization method.

  • Multiple Binary Codes for Fast Approximate Similarity Search

    Shinichi SHIRAKAWA  

     
    PAPER-Pattern Recognition

      Pubricized:
    2014/12/11
      Vol:
    E98-D No:3
      Page(s):
    671-680

    One of the fast approximate similarity search techniques is a binary hashing method that transforms a real-valued vector into a binary code. The similarity between two binary codes is measured by their Hamming distance. In this method, a hash table is often used when undertaking a constant-time similarity search. The number of accesses to the hash table, however, increases when the number of bits lengthens. In this paper, we consider a method that does not access data with a long Hamming radius by using multiple binary codes. Further, we attempt to integrate the proposed approach and the existing multi-index hashing (MIH) method to accelerate the performance of the similarity search in the Hamming space. Then, we propose a learning method of the binary hash functions for multiple binary codes. We conduct an experiment on similarity search utilizing a dataset of up to 50 million items and show that our proposed method achieves a faster similarity search than that possible with the conventional linear scan and hash table search.

  • Predicting Vectorization Profitability Using Binary Classification

    Antoine TROUVÉ  Arnaldo J. CRUZ  Dhouha BEN BRAHIM  Hiroki FUKUYAMA  Kazuaki J. MURAKAMI  Hadrien CLARKE  Masaki ARAI  Tadashi NAKAHIRA  Eiji YAMANAKA  

     
    PAPER-Software System

      Pubricized:
    2014/08/27
      Vol:
    E97-D No:12
      Page(s):
    3124-3132

    Basic block vectorization consists in realizing instruction-level parallelism inside basic blocks in order to generate SIMD instructions and thus speedup data processing. It is however problematic, because the vectorized program may actually be slower than the original one. Therefore, it would be useful to predict beforehand whether or not vectorization will actually produce any speedup. This paper proposes to do so by expressing vectorization profitability as a classification problem, and by predicting it using a machine learning technique called support vector machine (SVM). It considers three compilers (icc, gcc and llvm), and a benchmark suite made of 151 loops, unrolled with factors ranging from 1 to 20. The paper further proposes a technique that combines the results of two SVMs to reach 99% of accuracy for all three compilers. Moreover, by correctly predicting unprofitable vectorizations, the technique presented in this paper provides speedups of up to 2.16 times, 2.47 times and 3.83 times for icc, gcc and LLVM, respectively (9%, 18% and 56% on average). It also lowers to less than 1% the probability of the compiler generating a slower program with vectorization turned on (from more than 25% for the compilers alone).

  • Balanced Neighborhood Classifiers for Imbalanced Data Sets

    Shunzhi ZHU  Ying MA  Weiwei PAN  Xiatian ZHU  Guangchun LUO  

     
    LETTER-Pattern Recognition

      Vol:
    E97-D No:12
      Page(s):
    3226-3229

    A Balanced Neighborhood Classifier (BNEC) is proposed for class imbalanced data. This method is not only well positioned to capture the class distribution information, but also has the good merits of high-fitting-performance and simplicity. Experiments on both synthetic and real data sets show its effectiveness.

  • Edge-over-Erosion Error Prediction Method Based on Multi-Level Machine Learning Algorithm

    Daisuke FUKUDA  Kenichi WATANABE  Naoki IDANI  Yuji KANAZAWA  Masanori HASHIMOTO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E97-A No:12
      Page(s):
    2373-2382

    As VLSI process node continue to shrink, chemical mechanical planarization (CMP) process for copper interconnect has become an essential technique for enabling many-layer interconnection. Recently, Edge-over-Erosion error (EoE-error), which originates from overpolishing and could cause yield loss, is observed in various CMP processes, while its mechanism is still unclear. To predict these errors, we propose an EoE-error prediction method that exploits machine learning algorithms. The proposed method consists of (1) error analysis stage, (2) layout parameter extraction stage, (3) model construction stage and (4) prediction stage. In the error analysis and parameter extraction stages, we analyze test chips and identify layout parameters which have an impact on EoE phenomenon. In the model construction stage, we construct a prediction model using the proposed multi-level machine learning method, and do predictions for designed layouts in the prediction stage. Experimental results show that the proposed method attained 2.7∼19.2% accuracy improvement of EoE-error prediction and 0.8∼10.1% improvement of non-EoE-error prediction compared with general machine learning methods. The proposed method makes it possible to prevent unexpected yield loss by recognizing EoE-errors before manufacturing.

  • Unsupervised Learning Model for Real-Time Anomaly Detection in Computer Networks

    Kriangkrai LIMTHONG  Kensuke FUKUDA  Yusheng JI  Shigeki YAMADA  

     
    PAPER-Information Network

      Vol:
    E97-D No:8
      Page(s):
    2084-2094

    Detecting a variety of anomalies caused by attacks or accidents in computer networks has been one of the real challenges for both researchers and network operators. An effective technique that could quickly and accurately detect a wide range of anomalies would be able to prevent serious consequences for system security or reliability. In this article, we characterize detection techniques on the basis of learning models and propose an unsupervised learning model for real-time anomaly detection in computer networks. We also conducted a series of experiments to examine capabilities of the proposed model by employing three well-known machine learning algorithms, namely multivariate normal distribution, k-nearest neighbor, and one-class support vector machine. The results of these experiments on real network traffic suggest that the proposed model is a promising solution and has a number of flexible capabilities to detect several types of anomalies in real time.

  • Tree-Based Ensemble Multi-Task Learning Method for Classification and Regression

    Jaak SIMM  Ildefons MAGRANS DE ABRIL  Masashi SUGIYAMA  

     
    LETTER-Pattern Recognition

      Vol:
    E97-D No:6
      Page(s):
    1677-1681

    Multi-task learning is an important area of machine learning that tries to learn multiple tasks simultaneously to improve the accuracy of each individual task. We propose a new tree-based ensemble multi-task learning method for classification and regression (MT-ExtraTrees), based on Extremely Randomized Trees. MT-ExtraTrees is able to share data between tasks minimizing negative transfer while keeping the ability to learn non-linear solutions and to scale well to large datasets.

  • An Accurate Packer Identification Method Using Support Vector Machine

    Ryoichi ISAWA  Tao BAN  Shanqing GUO  Daisuke INOUE  Koji NAKAO  

     
    PAPER-Foundations

      Vol:
    E97-A No:1
      Page(s):
    253-263

    PEiD is a packer identification tool widely used for malware analysis but its accuracy is becoming lower and lower recently. There exist two major reasons for that. The first is that PEiD does not provide a way to create signatures, though it adopts a signature-based approach. We need to create signatures manually, and it is difficult to catch up with packers created or upgraded rapidly. The second is that PEiD utilizes exact matching. If a signature contains any error, PEiD cannot identify the packer that corresponds to the signature. In this paper, we propose a new automated packer identification method to overcome the limitations of PEiD and report the results of our numerical study. Our method applies string-kernel-based support vector machine (SVM): it can measure the similarity between packed programs without our operations such as manually creating signature and it provides some error tolerant mechanism that can significantly reduce detection failure caused by minor signature violations. In addition, we use the byte sequence starting from the entry point of a packed program as a packer's feature given to SVM. That is, our method combines the advantages from signature-based approach and machine learning (ML) based approach. The numerical results on 3902 samples with 26 packer classes and 3 unpacked (not-packed) classes shows that our method achieves a high accuracy of 99.46% outperforming PEiD and an existing ML-based method that Sun et al. have proposed.

  • Security Evaluation of RG-DTM PUF Using Machine Learning Attacks

    Mitsuru SHIOZAKI  Kousuke OGAWA  Kota FURUHASHI  Takahiko MURAYAMA  Masaya YOSHIKAWA  Takeshi FUJINO  

     
    PAPER-Hardware Based Security

      Vol:
    E97-A No:1
      Page(s):
    275-283

    In modern hardware security applications, silicon physical unclonable functions (PUFs) are of interest for their potential use as a unique identity or secret key that is generated from inherent characteristics caused by process variations. However, arbiter-based PUFs utilizing the relative delay-time difference between equivalent paths have a security issue in which the generated challenge-response pairs (CRPs) can be predicted by a machine learning attack. We previously proposed the RG-DTM PUF, in which a response is decided from divided time domains allocated to response 0 or 1, to improve the uniqueness of the conventional arbiter-PUF in a small circuit. However, its resistance against machine learning attacks has not yet been studied. In this paper, we evaluate the resistance against machine learning attacks by using a support vector machine (SVM) and logistic regression (LR) in both simulations and measurements and compare the RG-DTM PUF with the conventional arbiter-PUF and with the XOR arbiter-PUF, which strengthens the resistance by using XORing output from multiple arbiter-PUFs. In numerical simulations, prediction rates using both SVM and LR were above 90% within 1,000 training CRPs on the arbiter-PUF. The machine learning attack using the SVM could never predict responses on the XOR arbiter-PUF with over six arbiter-PUFs, whereas the prediction rate eventually reached 95% using the LR and many training CRPs. On the RG-DTM PUF, when the division number of the time domains was over eight, the prediction rates using the SVM were equal to the probability by guess. The machine learning attack using LR has the potential to predict responses, although an adversary would need to steal a significant amount of CRPs. However, the resistance can exponentially be strengthened with an increase in the division number, just like with the XOR arbiter-PUF. Over one million CRPs are required to attack the 16-divided RG-DTM PUF. Differences between the RG-DTM PUF and the XOR arbiter-PUF relate to the area penalty and the power penalty. Specifically, the XOR arbiter-PUF has to make up for resistance against machine learning attacks by increasing the circuit area, while the RG-DTM PUF is resistant against machine learning attacks with less area penalty and power penalty since only capacitors are added to the conventional arbiter-PUF. We also attacked RG-DTM PUF chips, which were fabricated with 0.18-µm CMOS technology, to evaluate the effect of physical variations and unstable responses. The resistance against machine learning attacks was related to the delay-time difference distribution, but unstable responses had little influence on the attack results.

121-140hit(172hit)